CLARIN and Free Open Source Finite-State Tools

نویسندگان

  • Kimmo Koskenniemi
  • Anssi Yli-Jyrä
چکیده

CLARIN stands for Common Language Resources and Technologies Research Infrastructure and it is one of the 35 infrastructure projects listed in the ESFRI roadmap of European research infrastructures for various areas. CLARIN has now entered its 3 year preparatory phase under a grant from the EU Commission. The preparatory phase of CLARIN has 32 partner organizations, (see www.clarin.eu for more details). There are quite a number of language resources around in Europe consisting of text and speech corpora with possible annotations, lexical materials, standards and norms and programs for parsing and processing of such data. These resources are fragmented in several ways, and it is difficult:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finite-State Spell-Checking with Weighted Language and Error Models—Building and Evaluating Spell-Checkers with Wikipedia as Corpus

In this paper we present simple methods for construction and evaluation of finite-state spell-checking tools using an existing finite-state lexical automaton, freely available finite-state tools and Internet corpora acquired from projects such as Wikipedia. As an example, we use a freely available open-source implementation of Finnish morphology, made with traditional finite-state morphology to...

متن کامل

Porting Basque Morphological Grammars to foma, an Open-Source Tool

Basque is a morphologically rich language, of which several finite-state morphological descriptions have been constructed, primarily using the Xerox/PARC finite-state tools. In this paper we describe the process of porting a previous description of Basque morphology to foma, an open-source finite-state toolkit compatible with Xerox tools, provide a comparison of the two tools, and contrast the ...

متن کامل

OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language

Finite-state methods are well established in language and speech processing. OpenFst (available from www.openfst.org) is a free and open-source software library for building and using finite automata, in particular, weighted finite-state transducers (FSTs). This tutorial is an introduction to weighted finitestate transducers and their uses in speech and language processing. While there are othe...

متن کامل

Using HFST for Creating Computational Linguistic Applications

HFST – Helsinki Finite-State Technology (hfst.sf.net) is a framework for compiling and applying linguistic descriptions with finitestate methods. HFST currently collects some of the most important finite-state tools for creating morphologies and spellcheckers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistic...

متن کامل

HFST Tools for Morphology - An Efficient Open-Source Package for Construction of Morphological Analyzers

Morphological analysis of a wide range of languages can be implemented efficiently using finite-state transducer technologies. Over the last 30 years, a number of attempts have been made to create tools for computational morphologies. The two main competing approaches have been parallel vs. cascaded rule application. The parallel rule application was originally introduced by Koskenniemi [1983] ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008